Mutual Information Criteria for Feature Selection
نویسندگان
چکیده
In many data analysis tasks, one is often confronted with very high dimensional data. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. To overcome this problem it is frequently assumed either that features independently influence the class variable or do so only involving pairwise feature interaction. In prior work [18], we have explained the use of a new measure called multidimensional interaction information (MII) for feature selection. The advantage of MII is that it can consider third or higher order feature interaction. Using dominant set clustering, we can extract most of the informative features in the leading dominant sets in advance, limiting the search space for higher order interactions. In this paper, we provide a comparison of different similarity measures based on mutual information. Experimental results demonstrate the effectiveness of our feature selection method on a number of standard data-sets.
منابع مشابه
Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کاملEvaluation of Mutual information versus Gini index for stable feature selection
The selection of highly discriminatory features has been crucial in aiding further advancements in domains such as biomedical sciences, high-energy physics and e-commerce. Therefore evaluation of the robustness of feature selection methods to small perturbations in the data, known as feature selection stability, is of great importance to people in these respective fields. However, little resear...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملForward feature selection using Residual Mutual Information
In this paper, we propose a hybrid filter/wrapper approach for fast feature selection using the Residual Mutual Information (RMI) between the function approximator output and the remaining features as selection criterion. This approach can handle redundancies in the data as well as the bias of the employed learning machine while keeping the number of required training and evaluation procedures ...
متن کاملWeighted Mutual Information for Feature Selection
In this paper, we apply weighted Mutual Information for effective feature selection. The presented hybrid filter wrapper approach resembles the well known AdaBoost algorithm by focusing on those samples that are not classified or approximated correctly using the selected features. Redundancies and bias of the employed learning machine are handled implicitly by our approach. In experiments, we c...
متن کامل